implicit regularization
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Denmark (0.04)
- Asia > Middle East > Jordan (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
c0c783b5fc0d7d808f1d14a6e9c8280d-Paper.pdf
A major hurdle in this study is that implicit regularization in deep learning seems to kick in only withcertain types ofdata(notwithrandom dataforexample), andwelackmathematical tools for reasoning about real-life data. Thus one needs a simple test-bed for the investigation, where data admits a crisp mathematical formulation. Following earlier works, we focus on the problem of matrix completion: given a randomly chosen subset of entries from an unknown matrixW, the taskistorecovertheunseen entries. Tocastthisasaprediction problem, wemayvieweach entry inW as a data point: observed entries constitute the training set, and the average reconstruction error over the unobserved entries is the test error,quantifying generalization. Fitting the observed entries is obviously an underdetermined problem with multiple solutions.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China > Liaoning Province > Dalian (0.04)
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- North America > Canada (0.04)